global action
- North America > United States > California (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Communications > Networks (0.93)
- North America > United States > California (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)
- Information Technology > Communications > Networks (0.68)
- North America > United States > California (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Communications > Networks (0.93)
- North America > United States > California (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces
Beeson, Alex, Ireland, David, Montana, Giovanni
Expanding reinforcement learning (RL) to offline domains generates promising prospects, particularly in sectors where data collection poses substantial challenges or risks. Pivotal to the success of transferring RL offline is mitigating overestimation bias in value estimates for state-action pairs absent from data. Whilst numerous approaches have been proposed in recent years, these tend to focus primarily on continuous or small-scale discrete action spaces. Factorised discrete action spaces, on the other hand, have received relatively little attention, despite many real-world problems naturally having factorisable actions. In this work, we undertake a formative investigation into offline reinforcement learning in factorisable action spaces. Using value-decomposition as formulated in DecQN as a foundation, we present the case for a factorised approach and conduct an extensive empirical evaluation of several offline techniques adapted to the factorised setting. In the absence of established benchmarks, we introduce a suite of our own comprising datasets of varying quality and task complexity. Advocating for reproducible research and innovation, we make all datasets available for public use alongside our code base.
- North America > United States > Montana (0.05)
- Europe > Ireland (0.05)
- Europe > United Kingdom > England > West Midlands > Coventry (0.04)
- (3 more...)
- Health & Medicine (0.46)
- Education (0.46)
- Transportation (0.46)
Goal-conditioned dual-action imitation learning for dexterous dual-arm robot manipulation
Kim, Heecheol, Ohmura, Yoshiyuki, Kuniyoshi, Yasuo
Long-horizon dexterous robot manipulation of deformable objects, such as banana peeling, is a problematic task because of the difficulties in object modeling and a lack of knowledge about stable and dexterous manipulation skills. This paper presents a goal-conditioned dual-action (GC-DA) deep imitation learning (DIL) approach that can learn dexterous manipulation skills using human demonstration data. Previous DIL methods map the current sensory input and reactive action, which often fails because of compounding errors in imitation learning caused by the recurrent computation of actions. The method predicts reactive action only when the precise manipulation of the target object is required (local action) and generates the entire trajectory when precise manipulation is not required (global action). This dual-action formulation effectively prevents compounding error in the imitation learning using the trajectory-based global action while responding to unexpected changes in the target object during the reactive local action. The proposed method was tested in a real dual-arm robot and successfully accomplished the banana-peeling task.
FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems
Zhao, Tianyu, Taherisadr, Mojtaba, Elmalaki, Salma
Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
- Health & Medicine (1.00)
- Education (1.00)
- Construction & Engineering > HVAC (0.46)
Backdoors to Planning
Kronegger, Martin (Vienna University of Technology) | Ordyniak, Sebastian (Masaryk University Brno) | Pfandler, Andreas (Vienna University of Technology)
Backdoors measure the distance to tractable fragments and have become an important tool to find fixed-parameter tractable (fpt) algorithms. Despite their success, backdoors have not been used for planning, a central problem in AI that has a high computational complexity. In this work, we introduce two notions of backdoors building upon the causal graph. We analyze the complexity of finding a small backdoor (detection) and using the backdoor to solve the problem (evaluation) in the light of planning with (un)bounded plan length/domain of the variables. For each setting we present either an fpt-result or rule out the existence thereof by showing parameterized intractability. In three cases we achieve the most desirable outcome: detection and evaluation are fpt.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)